AITopics | runtime monitor

Collaborating Authors

runtime monitor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach

Salvi, Aniket, Weiss, Gereon, Trapp, Mario

arXiv.org Artificial IntelligenceMay-21-2025

Autonomous systems that rely on Machine Learning (ML) utilize online fault tolerance mechanisms, such as runtime monitors, to detect ML prediction errors and maintain safety during operation. However, the lack of human-interpretable explanations for these errors can hinder the creation of strong assurances about the system's safety and reliability. This paper introduces a novel fuzzy-based monitor tailored for ML perception components. It provides human-interpretable explanations about how different operating conditions affect the reliability of perception components and also functions as a runtime safety monitor. We evaluated our proposed monitor using naturalistic driving datasets as part of an automated driving case study. The interpretability of the monitor was evaluated and we identified a set of operating conditions in which the perception component performs reliably. Additionally, we created an assurance case that links unit-level evidence of \textit{correct} ML operation to system-level \textit{safety}. The benchmarking demonstrated that our monitor achieved a better increase in safety (i.e., absence of hazardous situations) while maintaining availability (i.e., ability to perform the mission) compared to state-of-the-art runtime ML monitors in the evaluated dataset.

artificial intelligence, machine learning, perception component, (17 more...)

arXiv.org Artificial Intelligence

2505.14407

Country: Europe > Germany (0.14)

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.85)
Information Technology > Robotics & Automation (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Digital Twin Enabled Runtime Verification for Autonomous Mobile Robots under Uncertainty

Betzer, Joakim Schack, Boudjadar, Jalil, Frasheri, Mirgita, Talasila, Prasad

arXiv.org Artificial IntelligenceDec-13-2024

As autonomous robots increasingly navigate complex and unpredictable environments, ensuring their reliable behavior under uncertainty becomes a critical challenge. This paper introduces a digital twin-based runtime verification for an autonomous mobile robot to mitigate the impact posed by uncertainty in the deployment environment. The safety and performance properties are specified and synthesized as runtime monitors using TeSSLa. The integration of the executable digital twin, via the MQTT protocol, enables continuous monitoring and validation of the robot's behavior in real-time. We explore the sources of uncertainties, including sensor noise and environment variations, and analyze their impact on the robot safety and performance. Equipped with high computation resources, the cloud-located digital twin serves as a watch-dog model to estimate the actual state, check the consistency of the robot's actuations and intervene to override such actuations if a safety or performance property is about to be violated. The experimental analysis demonstrated high efficiency of the proposed approach in ensuring the reliability and robustness of the autonomous robot behavior in uncertain environments and securing high alignment between the actual and expected speeds where the difference is reduced by up to 41\% compared to the default robot navigation control.

artificial intelligence, robot, runtime monitor, (16 more...)

arXiv.org Artificial Intelligence

2412.09913

Country:

Europe > Switzerland (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)

Add feedback

Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress

Agia, Christopher, Sinha, Rohan, Yang, Jingyun, Cao, Zi-ang, Antonova, Rika, Pavone, Marco, Bohg, Jeannette

arXiv.org Artificial IntelligenceOct-10-2024

Robot behavior policies trained via imitation learning are prone to failure under conditions that deviate from their training data. Thus, algorithms that monitor learned policies at test time and provide early warnings of failure are necessary to facilitate scalable deployment. We propose Sentinel, a runtime monitoring framework that splits the detection of failures into two complementary categories: 1) Erratic failures, which we detect using statistical measures of temporal action consistency, and 2) task progression failures, where we use Vision Language Models (VLMs) to detect when the policy confidently and consistently takes actions that do not solve the task. Our approach has two key strengths. First, because learned policies exhibit diverse failure modes, combining complementary detectors leads to significantly higher accuracy at failure detection. Second, using a statistical temporal action consistency measure ensures that we quickly detect when multimodal, generative policies exhibit erratic behavior at negligible computational cost. In contrast, we only use VLMs to detect failure modes that are less time-sensitive. We demonstrate our approach in the context of diffusion policies trained on robotic mobile manipulation domains in both simulation and the real world. By unifying temporal consistency detection and VLM runtime monitoring, Sentinel detects 18% more failures than using either of the two detectors alone and significantly outperforms baselines, thus highlighting the importance of assigning specialized detectors to complementary categories of failure. Qualitative results are made available at https://sites.google.com/stanford.edu/sentinel.

detector, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.0464

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.24)
North America > United States > New York > New York County > New York City (0.14)
Asia > South Korea > Daegu > Daegu (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Diagnostic Runtime Monitoring with Martingales

Hindy, Ali, Luo, Rachel, Banerjee, Somrita, Kuck, Jonathan, Schmerling, Edward, Pavone, Marco

arXiv.org Artificial IntelligenceJul-31-2024

Machine learning systems deployed in safety-critical robotics settings must be robust to distribution shifts. However, system designers must understand the cause of a distribution shift in order to implement the appropriate intervention or mitigation strategy and prevent system failure. In this paper, we present a novel framework for diagnosing distribution shifts in a streaming fashion by deploying multiple stochastic martingales simultaneously. We show that knowledge of the underlying cause of a distribution shift can lead to proper interventions over the lifecycle of a deployed system. Our experimental framework can easily be adapted to different types of distribution shifts, models, and datasets. We find that our method outperforms existing work on diagnosing distribution shifts in terms of speed, accuracy, and flexibility, and validate the efficiency of our model in both simulated and live hardware settings.

distribution shift, intervention, martingale, (16 more...)

arXiv.org Artificial Intelligence

2407.21748

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.47)

Add feedback

Closing the Loop on Runtime Monitors with Fallback-Safe MPC

Sinha, Rohan, Schmerling, Edward, Pavone, Marco

arXiv.org Artificial IntelligenceSep-17-2023

When we rely on deep-learned models for robotic perception, we must recognize that these models may behave unreliably on inputs dissimilar from the training data, compromising the closed-loop system's safety. This raises fundamental questions on how we can assess confidence in perception systems and to what extent we can take safety-preserving actions when external environmental changes degrade our perception model's performance. Therefore, we present a framework to certify the safety of a perception-enabled system deployed in novel contexts. To do so, we leverage robust model predictive control (MPC) to control the system using the perception estimates while maintaining the feasibility of a safety-preserving fallback plan that does not rely on the perception system. In addition, we calibrate a runtime monitor using recently proposed conformal prediction techniques to certifiably detect when the perception system degrades beyond the tolerance of the MPC controller, resulting in an end-to-end safety assurance. We show that this control framework and calibration technique allows us to certify the system's safety with orders of magnitudes fewer samples than required to retrain the perception network when we deploy in a novel context on a photo-realistic aircraft taxiing simulator. Furthermore, we illustrate the safety-preserving behavior of the MPC on simulated examples of a quadrotor. We open-source our simulation platform and provide videos of our results at our project page: https://tinyurl.com/fallback-safe-mpc.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2309.08603

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.70)

Industry: Energy > Oil & Gas > Downstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)
(2 more...)

Add feedback

Out of Distribution Detection via Domain-Informed Gaussian Process State Space Models

Marco, Alonso, Morley, Elias, Tomlin, Claire J.

arXiv.org Artificial IntelligenceSep-15-2023

In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly distinguish between in- and out-of-training distribution observations hinges on the accuracy of these predictions, primarily affected by the class of functions the GPSSM kernel can represent. In this paper, we propose (i) a novel approach to embed existing domain knowledge in the kernel and (ii) an OoD online runtime monitor, based on receding-horizon predictions. Domain knowledge is provided in the form of a dataset, collected either in simulation or by using a nominal model. Numerical results show that the informed kernel yields better regression quality with smaller datasets, as compared to standard kernel choices. We demonstrate the effectiveness of the OoD monitor on a real quadruped navigating an indoor setting, which reliably classifies previously unseen terrains.

gaussian process, kernel, runtime monitor, (15 more...)

arXiv.org Artificial Intelligence

2309.06655

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

Bensalem, Saddek, Cheng, Chih-Hong, Huang, Wei, Huang, Xiaowei, Wu, Changshun, Zhao, Xingyu

arXiv.org Artificial IntelligenceJul-20-2023

Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.11784

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > India > Maharashtra > Pune (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

An investigation of challenges encountered when specifying training data and runtime monitors for safety critical ML applications

Heyn, Hans-Martin, Knauss, Eric, Malleswaran, Iswarya, Dinakaran, Shruthi

arXiv.org Artificial IntelligenceJan-31-2023

Context and motivation: The development and operation of critical software that contains machine learning (ML) models requires diligence and established processes. Especially the training data used during the development of ML models have major influences on the later behaviour of the system. Runtime monitors are used to provide guarantees for that behaviour. Question / problem: We see major uncertainty in how to specify training data and runtime monitoring for critical ML models and by this specifying the final functionality of the system. In this interview-based study we investigate the underlying challenges for these difficulties. Principal ideas/results: Based on ten interviews with practitioners who develop ML models for critical applications in the automotive and telecommunication sector, we identified 17 underlying challenges in 6 challenge groups that relate to the challenge of specifying training data and runtime monitoring. Contribution: The article provides a list of the identified underlying challenges related to the difficulties practitioners experience when specifying training data and runtime monitoring for ML models. Furthermore, interconnection between the challenges were found and based on these connections recommendation proposed to overcome the root causes for the challenges.

artificial intelligence, machine learning, training data, (17 more...)

arXiv.org Artificial Intelligence

2301.13476

Country:

Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
North America > United States (0.04)

Genre:

Personal > Interview (0.66)
Research Report > New Finding (0.47)

Industry:

Information Technology (1.00)
Automobiles & Trucks (1.00)
Telecommunications (0.68)
Law (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Out-Of-Distribution Detection Is Not All You Need

Guérin, Joris, Delmas, Kevin, Ferreira, Raul Sena, Guiochet, Jérémie

arXiv.org Artificial IntelligenceJan-13-2023

The usage of deep neural networks in safety-critical systems is limited by our ability to guarantee their correct behavior. Runtime monitors are components aiming to identify unsafe predictions and discard them before they can lead to catastrophic consequences. Several recent works on runtime monitoring have focused on out-of-distribution (OOD) detection, i.e., identifying inputs that are different from the training data. In this work, we argue that OOD detection is not a well-suited framework to design efficient runtime monitors and that it is more relevant to evaluate monitors based on their ability to discard incorrect predictions. We call this setting out-ofmodel-scope detection and discuss the conceptual differences with OOD. We also conduct extensive experiments on popular datasets from the literature to show that studying monitors in the OOD setting can be misleading: 1. very good OOD results can give a false impression of safety, 2. comparison under the OOD setting does not allow identifying the best monitor to detect errors. Finally, we also show that removing erroneous training data samples helps to train better monitors.

artificial intelligence, detection, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.16158

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.47)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Unifying Evaluation of Machine Learning Safety Monitors

Guerin, Joris, Ferreira, Raul Sena, Delmas, Kevin, Guiochet, Jérémie

arXiv.org Artificial IntelligenceAug-31-2022

With the increasing use of Machine Learning (ML) in critical autonomous systems, runtime monitors have been developed to detect prediction errors and keep the system in a safe state during operations. Monitors have been proposed for different applications involving diverse perception tasks and ML models, and specific evaluation procedures and metrics are used for different contexts. This paper introduces three unified safety-oriented metrics, representing the safety benefits of the monitor (Safety Gain), the remaining safety gaps after using it (Residual Hazard), and its negative impact on the system's performance (Availability Cost). To compute these metrics, one requires to define two return functions, representing how a given ML prediction will impact expected future rewards and hazards. Three use-cases (classification, drone landing, and autonomous driving) are used to demonstrate how metrics from the literature can be expressed in terms of the proposed metrics. Experimental results on these examples show how different evaluation choices impact the perceived performance of a monitor. As our formalism requires us to formulate explicit safety assumptions, it allows us to ensure that the evaluation conducted matches the high-level system requirements.

assumption, prediction, safety, (16 more...)

arXiv.org Artificial Intelligence

2208.1466

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > North Yorkshire > York (0.04)

Genre: Research Report (0.82)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.88)
Information Technology > Robotics & Automation (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback